{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "jt_Hqb95fRz8"
   },
   "source": [
    "# Slicing AutoML Tables Evaluation Results with BigQuery\n",
    "\n",
    "This colab assumes that you've created a dataset with AutoML Tables, and used that dataset to train a classification model. Once the model is done training, you also need to export the results table by using the following instructions. You'll see more detailed setup instructions below.\n",
    "\n",
    "This colab will walk you through the process of using BigQuery to visualize data slices, showing you one simple way to evaluate your model for bias.\n",
    "\n",
    "## Setup\n",
    "\n",
    "To use this Colab, copy it to your own Google Drive or open it in the Playground mode. Follow the instructions in the [AutoML Tables Product docs](https://cloud.google.com/automl-tables/docs/) to create a GCP project, enable the API, and create and download a service account private key, and set up required permission. You'll also need to use the AutoML Tables frontend or service to create a model and export its evaluation results to BigQuery. You should find a link on the Evaluate tab to view your evaluation results in BigQuery once you've finished training your model. Then navigate to BigQuery in your GCP console and you'll see your new results table in the list of tables to which your project has access. \n",
    "\n",
    "For demo purposes, we'll be using the [Default of Credit Card Clients](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients) dataset for analysis. This dataset was collected to help compare different methods of predicting credit card default. Using this colab to analyze your own dataset may require a little adaptation.\n",
    "\n",
    "The code below will sample if you want it to. Or you can set sample_count to be as large or larger than your dataset to use the whole thing for analysis. \n",
    "\n",
    "Note also that although the data we use in this demo is public, you'll need to enter your own Google Cloud project ID in the parameter below to authenticate to it.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "m2oL8tO-f9rK"
   },
   "outputs": [],
   "source": [
    "from __future__ import absolute_import\n",
    "from __future__ import division\n",
    "from __future__ import print_function\n",
    "\n",
    "from google.colab import auth\n",
    "import numpy as np\n",
    "import os\n",
    "import pandas as pd\n",
    "import sys\n",
    "sys.path.append('./python')\n",
    "from sklearn.metrics import confusion_matrix\n",
    "from sklearn.metrics import accuracy_score, roc_curve, roc_auc_score\n",
    "from sklearn.metrics import precision_recall_curve\n",
    "# For facets\n",
    "from IPython.core.display import display, HTML\n",
    "import base64\n",
    "!pip install --upgrade tf-nightly witwidget\n",
    "import witwidget.notebook.visualization as visualization\n",
    "!pip install apache-beam\n",
    "!pip install --upgrade tensorflow_model_analysis\n",
    "!pip install --upgrade tensorflow\n",
    "\n",
    "import tensorflow as tf\n",
    "import tensorflow_model_analysis as tfma\n",
    "print('TFMA version: {}'.format(tfma.version.VERSION_STRING))\n",
    "\n",
    "# https://cloud.google.com/resource-manager/docs/creating-managing-projects\n",
    "project_id = '[YOUR PROJECT ID HERE]' #@param {type:\"string\"}\n",
    "table_name = 'bigquery-public-data:ml_datasets.credit_card_default' #@param {type:\"string\"}\n",
    "os.environ[\"GOOGLE_CLOUD_PROJECT\"]=project_id\n",
    "sample_count = 3000 #@param\n",
    "row_count = pd.io.gbq.read_gbq('''\n",
    "  SELECT \n",
    "    COUNT(*) as total\n",
    "  FROM [%s]''' % (table_name), project_id=project_id, verbose=False).total[0]\n",
    "df = pd.io.gbq.read_gbq('''\n",
    "  SELECT\n",
    "    *\n",
    "  FROM\n",
    "    [%s]\n",
    "  WHERE RAND() < %d/%d\n",
    "''' % (table_name, sample_count, row_count), project_id=project_id, verbose=False)\n",
    "print('Full dataset has %d rows' % row_count)\n",
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "608Fe8PRtj5q"
   },
   "source": [
    "##Data Preprocessing\n",
    "\n",
    "Many of the tools we use to analyze models and data expect to find their inputs in the [tensorflow.Example](https://www.tensorflow.org/tutorials/load_data/tf_records) format. Here, we'll preprocess our data into tf.Examples, and also extract the predicted class from our classifier, which is binary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "lqZeO9aGtn2s"
   },
   "outputs": [],
   "source": [
    "unique_id_field = 'ID' #@param\n",
    "prediction_field_score = 'predicted_default_payment_next_month_tables_score'  #@param\n",
    "prediction_field_value = 'predicted_default_payment_next_month_tables_value'  #@param\n",
    "\n",
    "\n",
    "def extract_top_class(prediction_tuples):\n",
    "  # values from Tables show up as a CSV of individual json (prediction, confidence) objects.\n",
    "  best_score = 0\n",
    "  best_class = u''\n",
    "  for val, sco in prediction_tuples:\n",
    "    if sco > best_score:\n",
    "      best_score = sco\n",
    "      best_class = val\n",
    "  return (best_class, best_score)\n",
    "\n",
    "def df_to_examples(df, columns=None):\n",
    "  examples = []\n",
    "  if columns == None:\n",
    "    columns = df.columns.values.tolist()\n",
    "  for id in df[unique_id_field].unique():\n",
    "    example = tf.train.Example()\n",
    "    prediction_tuples = zip(df.loc[df[unique_id_field] == id][prediction_field_value], df.loc[df[unique_id_field] == id][prediction_field_score])\n",
    "    row = df.loc[df[unique_id_field] == id].iloc[0]\n",
    "    for col in columns:\n",
    "      if col == prediction_field_score or col == prediction_field_value:\n",
    "        # Deal with prediction fields separately\n",
    "        continue\n",
    "      elif df[col].dtype is np.dtype(np.int64):\n",
    "        example.features.feature[col].int64_list.value.append(int(row[col]))\n",
    "      elif df[col].dtype is np.dtype(np.float64):\n",
    "        example.features.feature[col].float_list.value.append(row[col])\n",
    "      elif row[col] is None:\n",
    "        continue\n",
    "      elif row[col] == row[col]:\n",
    "        example.features.feature[col].bytes_list.value.append(row[col].encode('utf-8'))\n",
    "    cla, sco = extract_top_class(prediction_tuples)\n",
    "    example.features.feature['predicted_class'].int64_list.value.append(cla)\n",
    "    example.features.feature['predicted_class_score'].float_list.value.append(sco)\n",
    "    examples.append(example)\n",
    "  return examples\n",
    "\n",
    "# Fix up some types so analysis is consistent. This code is specific to the dataset.\n",
    "df = df.astype({\"PAY_5\": float, \"PAY_6\": float})\n",
    "\n",
    "# Converts a dataframe column into a column of 0's and 1's based on the provided test.\n",
    "def make_label_column_numeric(df, label_column, test):\n",
    "  df[label_column] = np.where(test(df[label_column]), 1, 0)\n",
    "  \n",
    "# Convert label types to numeric. This code is specific to the dataset.\n",
    "make_label_column_numeric(df, 'predicted_default_payment_next_month_tables_value', lambda val: val == '1')\n",
    "make_label_column_numeric(df, 'default_payment_next_month', lambda val:  val == '1')\n",
    "\n",
    "examples = df_to_examples(df)\n",
    "print(\"Preprocessing complete!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "XwnOX_orVZEs"
   },
   "source": [
    "## What-If Tool\n",
    "\n",
    "First, we'll explore the data and predictions using the [What-If Tool](https://pair-code.github.io/what-if-tool/). The What-If tool is a powerful visual interface to explore data, models, and predictions. Because we're reading our results from BigQuery, we aren't able to use the features of the What-If Tool that query the model directly. But we can still learn a lot about this dataset from the exploration that the What-If tool enables.\n",
    "\n",
    "Imagine that you're curious to discover whether there's a discrepancy in the predictive power of your model depending on the marital status of the person whose credit history is being analyzed. You can use the What-If Tool to look at a glance and see the relative sizes of the data samples for each class. In this dataset, the marital statuses are encoded as 1 = married; 2 = single; 3 = divorce; 0=others. You can see using the What-If Tool that there are very few samples for classes other than married or single, which might indicate that performance could be compromised. If this lack of representation concerns you, you could consider collecting more data for underrepresented classes, downsampling overrepresented classes, or upweighting underrepresented data types as you train, depending on your use case and data availability.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "tjWxGOBkVXQ6"
   },
   "outputs": [],
   "source": [
    "WitWidget = visualization.WitWidget\n",
    "WitConfigBuilder = visualization.WitConfigBuilder\n",
    "\n",
    "num_datapoints = 2965  #@param {type: \"number\"}\n",
    "tool_height_in_px = 700  #@param {type: \"number\"}\n",
    "\n",
    "# Setup the tool with the test examples and the trained classifier\n",
    "config_builder = WitConfigBuilder(examples[:num_datapoints])\n",
    "# Need to call this so we have inference_address and model_name initialized\n",
    "config_builder = config_builder.set_estimator_and_feature_spec('', '')\n",
    "config_builder = config_builder.set_compare_estimator_and_feature_spec('', '')\n",
    "wv = WitWidget(config_builder, height=tool_height_in_px)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "YHydLAY991Du"
   },
   "source": [
    "## Tensorflow Model Analysis\n",
    "\n",
    "Then, let's examine some sliced metrics. This section of the tutorial will use [TFMA](https://github.com/tensorflow/model-analysis) model agnostic analysis capabilities. \n",
    "\n",
    "TFMA generates sliced metrics graphs and confusion matrices. We can use these to dig deeper into the question of how well this model performs on different classes of marital status. The model was built to optimize for AUC ROC metric, and it does fairly well for all of the classes, though there is a small performance gap for the \"divorced\" category. But when we look at the AUC-PR metric slices, we can see that the \"divorced\" and \"other\" classes are very poorly served by the model compared to the more common classes. AUC-PR is the metric that measures how well the tradeoff between precision and recall is being made in the model's predictions. If we're concerned about this gap, we could consider retraining to use AUC-PR as the optimization metric and see whether that model does a better job making equitable predictions. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "ZfU11b0797le"
   },
   "outputs": [],
   "source": [
    "import apache_beam as beam\n",
    "import tempfile\n",
    "\n",
    "from collections import OrderedDict\n",
    "from google.protobuf import text_format\n",
    "from tensorflow_model_analysis import post_export_metrics\n",
    "from tensorflow_model_analysis import types\n",
    "from tensorflow_model_analysis.api import model_eval_lib\n",
    "from tensorflow_model_analysis.evaluators import aggregate\n",
    "from tensorflow_model_analysis.extractors import slice_key_extractor\n",
    "from tensorflow_model_analysis.model_agnostic_eval import model_agnostic_evaluate_graph\n",
    "from tensorflow_model_analysis.model_agnostic_eval import model_agnostic_extractor\n",
    "from tensorflow_model_analysis.model_agnostic_eval import model_agnostic_predict\n",
    "from tensorflow_model_analysis.proto import metrics_for_slice_pb2\n",
    "from tensorflow_model_analysis.slicer import slicer\n",
    "from tensorflow_model_analysis.view.widget_view import render_slicing_metrics\n",
    "\n",
    "# To set up model agnostic extraction, need to specify features and labels of\n",
    "# interest in a feature map.\n",
    "feature_map = OrderedDict();\n",
    "\n",
    "for i, column in enumerate(df.columns):\n",
    "  type = df.dtypes[i]\n",
    "  if column == prediction_field_score or column == prediction_field_value:\n",
    "    continue\n",
    "  elif (type == np.dtype(np.float64)):\n",
    "    feature_map[column] =  tf.FixedLenFeature([], tf.float32)\n",
    "  elif (type == np.dtype(np.object)):\n",
    "    feature_map[column] =  tf.FixedLenFeature([], tf.string)\n",
    "  elif (type == np.dtype(np.int64)):\n",
    "    feature_map[column] = tf.FixedLenFeature([], tf.int64)\n",
    "  elif (type == np.dtype(np.bool)):\n",
    "    feature_map[column] = tf.FixedLenFeature([], tf.bool)\n",
    "  elif (type == np.dtype(np.datetime64)):\n",
    "    feature_map[column] = tf.FixedLenFeature([], tf.timestamp)\n",
    "\n",
    "feature_map['predicted_class'] = tf.FixedLenFeature([], tf.int64)\n",
    "feature_map['predicted_class_score'] = tf.FixedLenFeature([], tf.float32)\n",
    "\n",
    "serialized_examples = [e.SerializeToString() for e in examples]\n",
    "\n",
    "BASE_DIR = tempfile.gettempdir()\n",
    "OUTPUT_DIR = os.path.join(BASE_DIR, 'output')\n",
    "\n",
    "slice_column = 'MARRIAGE' #@param\n",
    "predicted_labels = 'predicted_class' #@param\n",
    "actual_labels = 'default_payment_next_month' #@param\n",
    "predicted_class_score = 'predicted_class_score' #@param\n",
    "\n",
    "with beam.Pipeline() as pipeline:\n",
    "  model_agnostic_config = model_agnostic_predict.ModelAgnosticConfig(\n",
    "            label_keys=[actual_labels],\n",
    "            prediction_keys=[predicted_labels],\n",
    "            feature_spec=feature_map)\n",
    "  \n",
    "  extractors = [\n",
    "          model_agnostic_extractor.ModelAgnosticExtractor(\n",
    "              model_agnostic_config=model_agnostic_config,\n",
    "              desired_batch_size=3),\n",
    "           slice_key_extractor.SliceKeyExtractor([\n",
    "               slicer.SingleSliceSpec(columns=[slice_column])\n",
    "           ])\n",
    "      ]\n",
    "\n",
    "  auc_roc_callback = post_export_metrics.auc(\n",
    "      labels_key=actual_labels,\n",
    "      target_prediction_keys=[predicted_labels])\n",
    "  \n",
    "  auc_pr_callback = post_export_metrics.auc(\n",
    "      curve='PR',\n",
    "      labels_key=actual_labels,\n",
    "      target_prediction_keys=[predicted_labels])\n",
    "  \n",
    "  confusion_matrix_callback = post_export_metrics.confusion_matrix_at_thresholds(\n",
    "      labels_key=actual_labels,\n",
    "      target_prediction_keys=[predicted_labels],\n",
    "      example_weight_key=predicted_class_score,\n",
    "      thresholds=[0.0, 0.5, 0.8, 1.0])\n",
    "\n",
    "  # Create our model agnostic aggregator.\n",
    "  eval_shared_model = types.EvalSharedModel(\n",
    "      construct_fn=model_agnostic_evaluate_graph.make_construct_fn(\n",
    "          add_metrics_callbacks=[confusion_matrix_callback,\n",
    "                                 auc_roc_callback,\n",
    "                                 auc_pr_callback,\n",
    "                                 post_export_metrics.example_count()],\n",
    "          fpl_feed_config=model_agnostic_extractor\n",
    "          .ModelAgnosticGetFPLFeedConfig(model_agnostic_config)))\n",
    "\n",
    "  # Run Model Agnostic Eval.\n",
    "  _ = (\n",
    "      pipeline\n",
    "      | beam.Create(serialized_examples)\n",
    "      | 'ExtractEvaluateAndWriteResults' >>\n",
    "        model_eval_lib.ExtractEvaluateAndWriteResults(\n",
    "            eval_shared_model=eval_shared_model,\n",
    "            output_path=OUTPUT_DIR,\n",
    "            extractors=extractors))\n",
    "    \n",
    "\n",
    "eval_result = tfma.load_eval_result(output_path=OUTPUT_DIR)\n",
    "render_slicing_metrics(eval_result,  slicing_column = slice_column)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "mOotC2D5Onqu"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "last_runtime": {
    "build_target": "//learning/fairness/colabs:ml_fairness_notebook",
    "kind": "shared"
   },
   "name": "slicing_eval_results.ipynb",
   "provenance": [
    {
     "file_id": "1goi268plF-1AJ77xjdMwIpapBr1ssb-q",
     "timestamp": 1551899111384
    },
    {
     "file_id": "/piper/depot/google3/cloud/ml/autoflow/colab/slicing_eval_results.ipynb?workspaceId=simonewu:autoflow-1::citc",
     "timestamp": 1547767618990
    },
    {
     "file_id": "1fjkKgZq5iMevPnfiIpSHSiSiw5XimZ1C",
     "timestamp": 1547596565571
    }
   ],
   "version": "0.3.2"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
